Binomial Distribution in R

Introduction

The binomial distribution is a discrete probability distribution that describes the number of successes in a fixed number of independent trials, where each trial has exactly two possible outcomes: success or failure.

It applies when:

There are a fixed number of trials (e.g., flipping a coin 10 times)
Each trial is independent
The probability of success (p) is constant for all trials

Creating the data frame

# Parameters
n <- 10            # number of trials
p <- 0.5           # probability of success (head)

# Create values for X = 0 to 10
x <- 0:n

# Binomial probabilities
prob <- dbinom(x, size = n, prob = p)
#The function dbinom() in R calculates the probability mass function (PMF) of the Binomial distribution which gives the probability of getting exactly x successes in n independent trials, each with success probability p.

# Create data frame
binom_df <- data.frame(
  Heads = x,
  Probability = prob
)

# View the dataset
print(binom_df)

   Heads  Probability
1      0 0.0009765625
2      1 0.0097656250
3      2 0.0439453125
4      3 0.1171875000
5      4 0.2050781250
6      5 0.2460937500
7      6 0.2050781250
8      7 0.1171875000
9      8 0.0439453125
10     9 0.0097656250
11    10 0.0009765625

Visualization of the probability mass function (PMF) of the Binomial distribution

# Plot the distribution
library(ggplot2)

ggplot(binom_df, aes(x = Heads, y = Probability)) +
  geom_bar(stat = "identity", fill = "skyblue", color = "black") +
  ggtitle("Binomial Distribution: Tossing a Coin 10 Times") +
  theme_minimal()

Applications of the Plot

Understanding Outcome Likelihoods: It helps to visually grasp which outcomes are most likely. In this example, the bar at 5 heads is the tallest, indicating that getting exactly 5 heads has the highest probability when tossing a fair coin 10 times.
Teaching & Communication : It’s an effective way to teach probability concepts such as:

Symmetry of the binomial distribution when 𝑝 = 0.5

Skewness when 𝑝 ≠ 0.5

The “bell-shaped” behavior for larger n

Decision Making & Risk Assessment: In fields like quality control, clinical trials, or marketing, it helps visualize the likelihood of various outcomes, aiding risk evaluation.
Model Checking : In statistics and machine learning, you might compare observed data to a theoretical distribution. This plot helps determine whether the binomial model is appropriate.
Parameter Sensitivity : You can change n and p and replot to see how the shape of the distribution changes — useful for experimentation and simulation.

Mean, Variance, and Standard Deviation of Binomial distribution

mean_binom <- n * p

var_binom <- n * p * (1 - p)

sd_binom <- sqrt(var_binom)


cat("Mean:", mean_binom, "\nVariance:", var_binom, "\nStandard Deviation:", sd_binom)

Mean: 5 
Variance: 2.5 
Standard Deviation: 1.581139

Application of the Satistics

Healthcare : Predicting number of patients recovering from treatment
Marketing : Estimating response rates to campaigns
Manufacturing : Defective item prediction in batches
Finance : Estimating default rate on loans
A/B Testing : Measuring success of website features

Cumulative Probabilities (CDF)

binom_df$Cumulative_Probability <- pbinom(x, size = n, prob = p)
print(binom_df)

   Heads  Probability Cumulative_Probability
1      0 0.0009765625           0.0009765625
2      1 0.0097656250           0.0107421875
3      2 0.0439453125           0.0546875000
4      3 0.1171875000           0.1718750000
5      4 0.2050781250           0.3769531250
6      5 0.2460937500           0.6230468750
7      6 0.2050781250           0.8281250000
8      7 0.1171875000           0.9453125000
9      8 0.0439453125           0.9892578125
10     9 0.0097656250           0.9990234375
11    10 0.0009765625           1.0000000000

Application of Cumulative Probability

It answers “what’s the probability of getting at most k successes?”
Often used to set decision thresholds or critical values (like in hypothesis testing).
Useful when designing tests, setting limits, or evaluating worst-case scenarios.

Flexible for experimentation

It’s easy to explore how the following changes affects the distribution’s shape and probabilities.

Number of trials (size)

Probability of success (prob)

Conclusion

The Binomial Distribution is a powerful and widely-used tool in statistics for modeling binary outcomes in repeated trials. In R, functions like dbinom() and pbinom() make it easy to calculate exact and cumulative probabilities, while visualization tools such as ggplot2 help in understanding the shape and behavior of the distribution.

By adjusting the number of trials and the probability of success, we can explore different real-world scenarios — from clinical trials and marketing campaigns to quality control processes. Understanding its properties, such as mean, variance, and standard deviation, provides deeper insights into data behavior and decision-making.